AITopics | first-person perspective

Collaborating Authors

first-person perspective

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Assessing Consciousness-Related Behaviors in Large Language Models Using the Maze Test

Pimenta, Rui A., Schlippe, Tim, Schaaff, Kristina

arXiv.org Artificial IntelligenceAug-26-2025

--We investigate consciousness-like behaviors in Large Language Models (LLMs) using the Maze T est, challenging models to navigate mazes from a first-person perspective. After synthesizing consciousness theories into 13 essential characteristics, we evaluated 12 leading LLMs across zero-shot, one-shot, and few-shot learning scenarios. Results showed reasoning-capable LLMs consistently outperforming standard versions, with Gemini 2.0 Pro achieving 52.9% Complete Path Accuracy and DeepSeek-R1 reaching 80.5% Partial Path Accuracy . The gap between these metrics indicates LLMs struggle to maintain coherent self-models throughout solutions--a fundamental consciousness aspect. While LLMs show progress in consciousness-related behaviors through reasoning mechanisms, they lack the integrated, persistent self-awareness characteristic of consciousness. The emergence of human-like capabilities in AI has been debated since the field's inception in the 1950s [1], [2]. An early case was ELIZA [3], a chatbot simulating a therapist. Though based on pattern matching, its responses were so convincing that Weizenbaum's secretary requested privacy for a "real conversation"--showing how humans can mistakenly perceive consciousness in even the simplest AI systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.16705

Country:

North America > United States (0.29)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EgoBlind: Towards Egocentric Visual Assistance for the Blind People

Xiao, Junbin, Huang, Nanxin, Qiu, Hao, Tao, Zhulin, Yang, Xun, Hong, Richang, Wang, Meng, Yao, Angela

arXiv.org Artificial IntelligenceMar-11-2025

We present EgoBlind, the first egocentric VideoQA dataset collected from blind individuals to evaluate the assistive capabilities of contemporary multimodal large language models (MLLMs). EgoBlind comprises 1,210 videos that record the daily lives of real blind users from a first-person perspective. It also features 4,927 questions directly posed or generated and verified by blind individuals to reflect their needs for visual assistance under various scenarios. We provide each question with an average of 3 reference answers to alleviate subjective evaluation. Using EgoBlind, we comprehensively evaluate 15 leading MLLMs and find that all models struggle, with the best performers achieving accuracy around 56\%, far behind human performance of 87.4\%. To guide future advancements, we identify and summarize major limitations of existing MLLMs in egocentric visual assistance for the blind and provide heuristic suggestions for improvement. With these efforts, we hope EgoBlind can serve as a valuable foundation for developing more effective AI assistants to enhance the independence of the blind individuals' lives.

blind people, question type, video, (16 more...)

arXiv.org Artificial Intelligence

2503.08221

Country:

North America > United States (0.04)
Asia > Singapore (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Education (0.67)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Exo-ViHa: A Cross-Platform Exoskeleton System with Visual and Haptic Feedback for Efficient Dexterous Skill Learning

Chao, Xintao, Mu, Shilong, Liu, Yushan, Li, Shoujie, Lyu, Chuqiao, Zhang, Xiao-Ping, Ding, Wenbo

arXiv.org Artificial IntelligenceMar-3-2025

Imitation learning has emerged as a powerful paradigm for robot skills learning. However, traditional data collection systems for dexterous manipulation face challenges, including a lack of balance between acquisition efficiency, consistency, and accuracy. To address these issues, we introduce Exo-ViHa, an innovative 3D-printed exoskeleton system that enables users to collect data from a first-person perspective while providing real-time haptic feedback. This system combines a 3D-printed modular structure with a slam camera, a motion capture glove, and a wrist-mounted camera. Various dexterous hands can be installed at the end, enabling it to simultaneously collect the posture of the end effector, hand movements, and visual data. By leveraging the first-person perspective and direct interaction, the exoskeleton enhances the task realism and haptic feedback, improving the consistency between demonstrations and actual robot deployments. In addition, it has cross-platform compatibility with various robotic arms and dexterous hands. Experiments show that the system can significantly improve the success rate and efficiency of data collection for dexterous manipulation tasks.

arxiv preprint arxiv, data collection, dexterous hand, (14 more...)

arXiv.org Artificial Intelligence

2503.01543

Country: Asia > China > Guangdong Province > Shenzhen (0.05)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.70)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.47)

Add feedback

Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective

Hou, Guiyang, Zhang, Wenqi, Shen, Yongliang, Tan, Zeqi, Shen, Sihao, Lu, Weiming

arXiv.org Artificial IntelligenceDec-28-2024

Social intelligence is built upon three foundational pillars: cognitive intelligence, situational intelligence, and behavioral intelligence. As large language models (LLMs) become increasingly integrated into our social lives, understanding, evaluating, and developing their social intelligence are becoming increasingly important. While multiple existing works have investigated the social intelligence of LLMs, (1) most focus on a specific aspect, and the social intelligence of LLMs has yet to be systematically organized and studied; (2) position LLMs as passive observers from a third-person perspective, such as in Theory of Mind (ToM) tests. Compared to the third-person perspective, ego-centric first-person perspective evaluation can align well with actual LLM-based Agent use scenarios. (3) a lack of comprehensive evaluation of behavioral intelligence, with specific emphasis on incorporating critical human-machine interaction scenarios. In light of this, we present EgoSocialArena, a novel framework grounded in the three pillars of social intelligence: cognitive, situational, and behavioral intelligence, aimed to systematically evaluate the social intelligence of LLMs from a first-person perspective. With EgoSocialArena, we have conducted a comprehensive evaluation of eight prominent foundation models, even the most advanced LLMs like o1-preview lag behind human performance by 11.0 points.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.06195

Country: North America > United States > Texas (0.05)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Diffusing in Someone Else's Shoes: Robotic Perspective Taking with Diffusion

Spisak, Josua, Kerzel, Matthias, Wermter, Stefan

arXiv.org Artificial IntelligenceApr-11-2024

Humanoid robots can benefit from their similarity to the human shape by learning from humans. When humans teach other humans how to perform actions, they often demonstrate the actions and the learning human can try to imitate the demonstration. Being able to mentally transfer from a demonstration seen from a third-person perspective to how it should look from a first-person perspective is fundamental for this ability in humans. As this is a challenging task, it is often simplified for robots by creating a demonstration in the first-person perspective. Creating these demonstrations requires more effort but allows for an easier imitation. We introduce a novel diffusion model aimed at enabling the robot to directly learn from the third-person demonstrations. Our model is capable of learning and generating the first-person perspective from the third-person perspective by translating the size and rotations of objects and the environment between two perspectives. This allows us to utilise the benefits of easy-to-produce third-person demonstrations and easy-to-imitate first-person demonstrations. The model can either represent the first-person perspective in an RGB image or calculate the joint values. Our approach significantly outperforms other image-to-image models in this task.

demonstration, first-person perspective, robot, (17 more...)

arXiv.org Artificial Intelligence

2404.07735

Country:

Europe > Germany > Hamburg (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Can Vision-Language Models Think from a First-Person Perspective?

Cheng, Sijie, Guo, Zhicheng, Wu, Jingwen, Fang, Kechen, Li, Peng, Liu, Huaping, Liu, Yang

arXiv.org Artificial IntelligenceNov-27-2023

Vision-language models (VLMs) have recently shown promising results in traditional downstream tasks. Evaluation studies have emerged to assess their abilities, with the majority focusing on the third-person perspective, and only a few addressing specific tasks from the first-person perspective. However, the capability of VLMs to "think" from a first-person perspective, a crucial attribute for advancing autonomous agents and robotics, remains largely unexplored. To bridge this research gap, we introduce EgoThink, a novel visual question-answering benchmark that encompasses six core capabilities with twelve detailed dimensions. The benchmark is constructed using selected clips from egocentric videos, with manually annotated question-answer pairs containing first-person information. To comprehensively assess VLMs, we evaluate eighteen popular VLMs on EgoThink. Moreover, given the open-ended format of the answers, we use GPT-4 as the automatic judge to compute single-answer grading. Experimental results indicate that although GPT-4V leads in numerous dimensions, all evaluated VLMs still possess considerable potential for improvement in first-person perspective tasks. Meanwhile, enlarging the number of trainable parameters has the most significant impact on model performance on EgoThink. In conclusion, EgoThink serves as a valuable addition to existing evaluation benchmarks for VLMs, providing an indispensable resource for future research in the realm of embodied artificial intelligence and robotics.

benchmark, dimension, gpt-4v, (13 more...)

arXiv.org Artificial Intelligence

2311.15596

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Semantic Segmentation for Autonomous Driving: Model Evaluation, Dataset Generation, Perspective Comparison, and Real-Time Capability

Cakir, Senay, Gauß, Marcel, Häppeler, Kai, Ounajjar, Yassine, Heinle, Fabian, Marchthaler, Reiner

arXiv.org Artificial IntelligenceJul-26-2022

Environmental perception is an important aspect within the field of autonomous vehicles that provides crucial information about the driving domain, including but not limited to identifying clear driving areas and surrounding obstacles. Semantic segmentation is a widely used perception method for self-driving cars that associates each pixel of an image with a predefined class. In this context, several segmentation models are evaluated regarding accuracy and efficiency. Experimental results on the generated dataset confirm that the segmentation model FasterSeg is fast enough to be used in realtime on lowpower computational (embedded) devices in self-driving cars. A simple method is also introduced to generate synthetic training data for the model. Moreover, the accuracy of the first-person perspective and the bird's eye view perspective are compared. For a $320 \times 256$ input in the first-person perspective, FasterSeg achieves $65.44\,\%$ mean Intersection over Union (mIoU), and for a $320 \times 256$ input from the bird's eye view perspective, FasterSeg achieves $64.08\,\%$ mIoU. Both perspectives achieve a frame rate of $247.11$ Frames per Second (FPS) on the NVIDIA Jetson AGX Xavier. Lastly, the frame rate and the accuracy with respect to the arithmetic 16-bit Floating Point (FP16) and 32-bit Floating Point (FP32) of both perspectives are measured and compared on the target hardware.

accuracy, eye view perspective, segmentation model, (13 more...)

arXiv.org Artificial Intelligence

2207.12939

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (0.90)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

Facebook is now developing a human-like artificial intelligence called Ego4D

#artificialintelligenceOct-22-2021, 06:15:07 GMT

Facebook announced a research project Thursday that aims to develop an artificial intelligence capable of perceiving the world like a human being. The project, titled Ego4D, aims to train an artificial intelligence (AI) to perceive the world in the first-person by analyzing a constant stream of video from people's lives. This type of data, which Facebook calls "egocentric" data, is designed to help the AI perceive, remember and plan like a human being. "Next-generation AI systems will need to learn from an entirely different kind of data -- videos that show the world from the center of the action, rather than the sidelines," Kristen Grauman, lead AI research scientist at Facebook, said in the announcement. The project aims to improve AI technology's capacity to accomplish human processes by setting five key benchmarks: "episodic memory," in which the AI ties memories to specific locations and times, "forecasting," "social interaction," "hand and object manipulation" and "audio-visual diarization," in which the AI ties auditory experiences to specific locations and times.

artificial intelligence, facebook, human-like artificial intelligence, (8 more...)

#artificialintelligence

Industry: Information Technology > Services (0.72)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Teaching AI to perceive the world through your eyes

#artificialintelligenceOct-16-2021, 09:30:13 GMT

AI that understands the world from a first-person point of view could unlock a new era of immersive experiences, as devices like augmented reality (AR) glasses and virtual reality (VR) headsets become as useful in everyday life as smartphones. Imagine your AR device displaying exactly how to hold the sticks during a drum lesson, guiding you through a recipe, helping you find your lost keys, or recalling memories as holograms that come to life in front of you. To build these new technologies, we need to teach AI to understand and interact with the world like we do, from a first-person perspective -- commonly referred to in the research community as egocentric perception. Today's computer vision (CV) systems, however, typically learn from millions of photos and videos that are captured in third-person perspective, where the camera is just a spectator to the action. "Next-generation AI systems will need to learn from an entirely different kind of data -- videos that show the world from the center of the action, rather than the sidelines," says Kristen Grauman, lead research scientist at Facebook.

ai assistant, egocentric perception, university, (15 more...)

#artificialintelligence

Country:

Asia > Singapore (0.05)
South America > Colombia (0.04)
North America > United States > Pennsylvania (0.04)
(8 more...)

Genre: Research Report (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Facebook: Here comes the AI of the Metaverse

#artificialintelligenceOct-15-2021, 02:16:11 GMT

To operate in augmented and virtual reality, Facebook believes artificial intelligence will need to develop an "egocentric perspective." To that end, the company on Thursday announced Ego4D, a data set of 2,792 hours of first-person video, and a set of benchmark tests for neural nets, designed to encourage the development of AI that is savvier about what it's like to move through virtual worlds from a first-person perspective. The project is a collaboration between Facebook Reality Labs and scholars from 13 research institutions, including academic institutions and research labs. The details are laid out in a paper lead-authored by Facebook's Kristen Grauman, "Ego4D: Around the World in 2.8K Hours of Egocentric Video." Grauman is a scientist with the company's Facebook AI Research unit.

facebook, neural net, video, (14 more...)

#artificialintelligence

Industry: Information Technology > Services (0.52)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.42)

Add feedback